Police stops to reduce crime: A systematic review and meta‐analysis

Abstract Background Police‐initiated pedestrian stops have been one of the most widely used crime prevention tactics in modern policing. Proponents have long considered police stops to be an indispensable component of crime prevention efforts, with many holding them responsible for the significant reductions in violent crime observed across major US cities in recent decades. Critics, however, have taken issue with the overuse of pedestrian stops, linking them to worsening mental and physical health, attitudes toward the police, and elevated delinquent behavior for individuals directly subject to them. To date, there has been no systematic review or meta‐analysis on the effects of these interventions on crime and individual‐level outcomes. Objectives To synthesize the existing evaluation research regarding the impact of police‐initiated pedestrian stops on crime and disorder, mental and physical health, individual attitudes toward the police, self‐reported crime/delinquency, violence in police‐citizen encounters, and police misbehavior. Search Methods We used the Global Policing Database, a repository of all experimental and quasi‐experimental evaluations of policing interventions conducted since 1950, to search for published and unpublished evaluations of pedestrian stop interventions through December of 2019. This overarching search was supplemented by additional searches of academic databases, gray literature sources, and correspondence with subject‐matter experts to capture eligible studies through December 2021. Selection Criteria Eligibility was limited to studies that included a treatment group of people or places experiencing pedestrian stops and a control group of people or places not experiencing pedestrian stops (or experiencing a lower dosage of pedestrian stops). Studies were required to use an experimental or quasi‐experimental design and evaluate the intervention using an outcome of area‐level crime and disorder, mental or physical health, individual or community‐level attitudes toward the police, or self‐reported crime/delinquency. Data Collection and Analysis We adopted standard methodological procedures expected by the Campbell Collaboration. Eligible studies were grouped by conceptually similar outcomes and then analyzed separately using random effects models with restricted maximum likelihood estimation. Treatment effects were represented using relative incident rate ratios, odds ratios, and Hedges' g effect sizes, depending on the unit of analysis and outcome measure. We also conducted sensitivity analyses for several outcome measures using robust variance estimation, with standard errors clustered by each unique study/sample. Risk of bias was assessed using items adapted from the Cochrane randomized and non‐randomized risk of bias tools. Results Our systematic search strategies identified 40 eligible studies corresponding to 58 effect sizes across six outcome groupings, representing 90,904 people and 20,876 places. Police‐initiated pedestrian stop interventions were associated with a statistically significant 13% (95% confidence interval [CI]: −16%, −9%, p < 0.001) reduction in crime for treatment areas relative to control areas. These interventions also led to a diffusion of crime control benefits, with a statistically significant 7% (95% CI: −9%, −4%, p < 0.001) reduction in crime for treatment displacement areas relative to control areas. However, pedestrian stops were also associated with a broad range of negative individual‐level effects. Individuals experiencing police stops were associated with a statistically significant 46% (95% CI: 24%, 72%, p < 0.001) increase in the odds of a mental health issue and a 36% (95% CI: 14%, 62%, p < 0.001) increase in the odds of a physical health issue, relative to control. Individuals experiencing police stops also reported significantly more negative attitudes toward the police (g = −0.38, 95% CI: −0.59, −0.17, p < 0.001) and significantly higher levels of self‐reported crime/delinquency (g = 0.30, 95% CI: 0.12, 0.48, p < 0.001), equating to changes of 18.6% and 15%, respectively. No eligible studies were identified measuring violence in police‐citizen encounters or officer misbehavior. While eligible studies were often considered to be at moderate to high risk of bias toward control groups, no significant differences based on methodological rigor were observed. Moderator analyses also indicated that the negative individual‐level effects of pedestrian stops may be more pronounced for youth, and that significant differences in effect sizes may exist between US and European studies. However, these moderator analyses were limited by a small number of studies in each comparison, and we were unable to compare the effects of police stops across racial groupings. Authors' Conclusions While our findings point to favorable effects of pedestrian stop interventions on place‐based crime and displacement outcomes, evidence of negative individual‐level effects makes it difficult to recommend the use of these tactics over alternative policing interventions. Recent systematic reviews of hot spots policing and problem‐oriented policing approaches indicate a more robust evidence‐base and generally larger crime reduction effects than those presented here, often without the associated backfire effects on individual health, attitudes, and behavior. Future research should examine whether police agencies can mitigate the negative effects of pedestrian stops through a focus on officer behavior during these encounters.

1 | PLAIN LANGUAGE SUMMARY 1.1 | Police stops are associated with reductions in crime but also a broad range of negative individual-level outcomes Police stop interventions produce meaningful and significant reductions in crime without evidence of spatial displacement. However, people subject to stops are associated with significantly less desirable mental and physical health outcomes, attitudes toward police, and self-reported crime/delinquency. For some outcome measures, the negative effects of pedestrian stops are considerably more pronounced for youth, though the data did not permit a comparison of individual effects by race.

| What is this review about?
Police stops have become one of the most controversial yet widelyused crime prevention strategies in modern policing. This intervention involves the police-initiated stop of an individual (or group of individuals) on the street, for the purpose of investigation and/or questioning. Police stops have been commonly used as a tactic to combat violent and gun-related crime.
The current review assesses the effect of police stops (used interchangeably here with "pedestrian stops") on both place-based and person-based outcomes, including crime, spatial displacement, mental health, physical health, attitudes toward the police, and selfreported crime/delinquency.

What is the aim of this review?
This Campbell systematic review examines the effects of police-initiated pedestrian stops on both place-based and person-based outcomes. It synthesizes results from 40 studies across six outcome groupings. Studies were predominately conducted in the USA.

| What studies are included?
Forty studies published between 1970-2021 are included in this review. Eligibility was limited to experimental and quasiexperimental studies with a treatment group of people or places that experienced police stops and a control group of people or places that did not experience police stops (or experienced a lower dosage of stops).
Studies focusing only on police-initiated traffic stops were excluded from this review. Only one eligible study was a randomized controlled trial, 33 studies were conducted in the USA, and seven were conducted in Europe.

| What are the main findings of this review?
Police stop interventions lead to significant reductions in area-level crime with evidence of a diffusion of crime control benefits to nearby areas. However, methodological difficulties limit the strength of the causal inferences derived from these studies; further research is needed.
Individuals stopped by police are associated with significantly higher odds of both mental and physical health issues, significantly more negative attitudes toward the police, and elevated levels of self-reported crime/delinquency. The impact of a direct stop experience on mental health issues is also considerably larger for youth, compared to adults.
Despite this finding, place-based studies incorporating community surveys suggest that stop interventions do not impact community-level attitudes toward the police, and thus the negative effects of these interventions may be limited to the individuals directly experiencing them.
The findings of this review should be interpreted with caution, however, as only one randomized experiment assessing crime prevention outcomes was identified, and person-based studies were often unable to establish temporal ordering between the treatment and outcome measures.

| What do the findings of this review mean?
Policing efforts focused on high-volume pedestrian stops are likely to reduce crime but may do so at the cost of negative health outcomes, negative attitudes toward the police, and higher levels of delinquency for individuals subject to the intervention. Given the net-widening effects of pedestrian stops (i.e., low proportions of stops lead to arrests or weapon seizures), these interventions may produce more harm than good. Police agencies should carefully weigh the potential benefits and harms associated with these interventions.
Furthermore, recent reviews on tactics such as hot spots policing and problem-oriented policing have demonstrated larger reductions in crime without similar backfire effects. The evidencebase for these tactics is also of considerably higher methodological rigor, generating stronger conclusions regarding program effectiveness. While it is possible that police agencies can mitigate the negative effects of pedestrian stops through a focus on improving officer conduct during police-citizen encounters, this review is unable to provide evidence of this effect.
2 | BACKGROUND 2.1 | The problem, condition, or issue The use of pedestrian stops has been one of the most common yet controversial proactive strategies in modern policing (Weisburd & Majmundar, 2018). The pedestrian stop (also known as stop and frisk, Terry stops, street pops, stop and search, street stops, etc.) is often defined as the process by which "officers stop, and potentially question and search, people in the communities they are patrolling" (Lachman et al., 2012, p. 1). These tactics have been a staple in policing for generations, but they gained legitimacy with the landmark US Supreme Court decision in Terry v. Ohio (1968)-which allows police officers discretion to conduct an investigatory stop of an individual given reasonable suspicion that the individual has committed a crime or is in the process of committing a crime, and discretion to frisk (or pat-down) the individual given reasonable suspicion that they are carrying a weapon (see Jones-Brown et al., 2010).
Often termed "stop, question, and frisk (SQF)" (Rosenfeld & Fornango, 2014, p. 96), evidence suggests that many US police departments began using pedestrian stops widely as a proactive policing strategy in the 1990s and early 2000s (Gelman et al., 2007;White & Fradella, 2016). In New York City alone, recorded SQFs increased from 160,851 in 2003 to 685,000 in 2011 (Weisburd et al., 2016), and similar increases have been noted in other US cities such as Philadelphia and Los Angeles (Jones- Brown et al., 2010;Saul, 2016). Police "stop and search" (McCandless et al., 2016, p. 2) powers have also been noted in the UK, where targeted pedestrian stops have been used as a strategy to reduce knife crime , and in other European countries such as Bulgaria, Hungary, and Spain, often for the purpose of conducting identity checks related to criminal investigations (Miller et al., 2008). In this context, pedestrian stops have been used as primary components in various proactive policing interventions, including crackdowns (Sherman, 1990), efforts to reduce illegal gun carrying (Koper & Mayo-Wilson, 2006), directed patrol interventions (Ratcliffe et al., 2011), and hot spots policing interventions .
While advocates have considered pedestrian stops to be a contributing factor to decreasing levels of crime in American cities (Baker & Goldstein, 2012), critics have pointed to the low success rates (i.e., low proportions of stops that lead to arrest or weapon seizure) and racial disparity associated with these strategies as evidence that such tactics represent an illegal and unjust use of police power (Fagan & Davies, 2000;Gelman et al., 2007;. Racial and ethnic profiling has also been a concern on an international level, with researchers noting racially disparate stop rates in several European countries, without clear evidence that these strategies have produced meaningful crime reductions (McCandless et al., 2016;Miller et al., 2008;. Additionally, academic and social discourse has highlighted the potential deleterious effects of pedestrian stops on outcomes such as mental and physical health (see Geller et al., 2014;McFarland et al., 2019), attitudes toward the police (see Harris & Jones, 2020;Rosenbaum et al., 2005;Tyler et al., 2014), and even future delinquency and offending Wiley et al., 2013). In other words, though the goal of pedestrian stops may be to produce a general deterrent effect, the intervention may also produce latent backfire effects for the individuals directly subjected to them.
Despite such challenges, practitioners still view pedestrian stops as an important element of proactive crime prevention efforts (D'Onfrio, 2019;Terkel, 2013), making an understanding of their effects on crime, individuals, and the larger community increasingly important. Studying the crime reduction effects of pedestrian stop tactics has been difficult, however, given that stops have been used as components of numerous different interventions and have been evaluated using a variety of different techniques (see Koper & Mayo-Wilson, 2006;MacDonald et al., 2016;Sherman, 1990;Smith & Purtell, 2008;Weisburd et al., 2016). Thus, the current work attempts to fill this gap by conducting a systematic review and meta-analysis on the impact of pedestrian stops as a proactive policing strategy for reducing crime. Additionally, we seek to examine the effects of pedestrian stops on both the individuals and communities subjected to these strategies.

| The intervention
Pedestrian stops involve the police-initiated stop of an individual (or group of individuals) on the street for the purpose of investigation and/or questioning (Lachman et al., 2012). In most cases, the officer must have reasonable suspicion that a person is involved in criminal activity for a stop to occur, and based on the level of suspicion, a frisk or search of the person may be conducted. However, in certain contexts stops may be conducted without suspicion or the threshold for reasonable suspicion may vary. In the UK, the Criminal Justice and Public Order Act of 1994 permits suspicion-less stops in high-risk areas with approval from an authorizing officer (Lennon, 2013(Lennon, , 2015. Police officers in the UK and other European countries are also permitted to conduct suspicion-less stops of people in authorized areas as a proactive counter-terrorism measure (Lennon, 2013).
Similarly, the US Supreme Court has ruled that the amount of crime in a given area can be used as a factor in an officer's determination of reasonable suspicion (Gelman et al., 2007;Illinois v. Wardlow). Thus, it is important to note that while pedestrian stops are often reactive in nature, in that they require prior indication of suspicious behavior or criminal activity, they may also be used proactively. In this regard, it is important to distinguish between pedestrian stops at the individual level and pedestrian stops as employed in proactive policing interventions. Proactive policing involves "policing strategies that have as one of their goals the prevention or reduction of crime and disorder and that are not reactive in terms of focusing primarily on uncovering ongoing crime or on investigating or responding to crimes once they have occurred" (Weisburd & Majmundar, 2018, p. 1). Thus, while pedestrian stops conducted in response to observed or reported criminal behavior are reactive in nature, using pedestrian stops as part of a coordinated effort to deter or prevent crime is consistent with the tenets of proactive policing.
Pedestrian stops may be employed as distinct proactive policing strategies or used as components of larger interventions such as short-term police crackdowns (Sherman, 1990), directed patrol presence (McGarrell et al., 2002;Ratcliffe et al., 2011), or hot spots policing . While pedestrian stops have primarily been implemented as a tactic to reduce violent and/or weapon-related crime (Koper & Mayo-Wilson, 2006;Ratcliffe et al., 2011;, they have also been used to target other crime/disorder problems (e.g., drug-related crime, see Geller & Fagan, 2010;Levine & Small, 2008). Additionally, natural variation in the use of pedestrian stops across geographic areas and/ or police jurisdictions means that certain individuals are exceedingly likely to be subject to stops, while others are not (see Fagan & Davies, 2000). This draws attention to the importance of both the individual and community-level elements of the intervention.
Pedestrian stops represent a policing tactic acutely targeted at specific people, despite the intent to produce larger community and area-level reductions in crime and disorder. The current review includes any policing intervention employing pedestrian stops as a primary component, regardless of what (if any) specific crime/ disorder outcome is being targeted. Here, the term "policing intervention" refers to both specific programmatic approaches targeted at particular areas (e.g., hot spots or hot neighborhoods), as well as natural variation or the generalized use of pedestrian stops as a crime prevention approach (similar to the use of preventive patrols to reduce crime in a city). Thus, the current review examines both place-based and individual-level impacts of the intervention.

| How the intervention might work
It has often been argued that offenders weigh the potential costs and benefits associated with a criminal act. Accordingly, individuals may be deterred from committing crime in situations where the potential costs of crime outweigh the potential benefits (Beccaria, 1986;Bentham, 1988;Durlauf & Nagin, 2011;Nagin, 2013). Pedestrian stops may deter crime by increasing these perceived costs, and likewise the perceived certainty of apprehension if a crime is committed (Lachman et al., 2012). In other words, people who have been personally stopped by the police may alter their behavior or avoid the area where the stop occurred to mitigate their risk of punishment, while people who become vicariously aware of the pedestrian stop intervention may pre-emptively do the same . If pedestrian stops result in the seizure of weapons or other items that are used to commit crime, they may also produce an incapacitation effect by preventing access to the tools needed to commit criminal acts (see . Alternatively, it is possible that pedestrian stop strategies deter crime merely through increasing police presence in high-crime areas. In this context, the deterrent effect is not necessarily related to the strategy itself, but rather to the increased police visibility in the area. It is key in any policing program to disentangle the impacts of specific policing strategies on both the individuals targeted and the communities in which they are applied. Advocates of pedestrian stops focus on the benefits of reduced crime in the community (D'Onfrio, 2019;Terkel, 2013). However, other research suggests that pedestrian stops are often perceived as unfair/unlawful, producing backfire effects on community attitudes toward the police (Miller et al., 2000;Tyler et al., 2014). That is, police-initiated stops may reduce feelings of police legitimacy among the individuals stopped or the communities in which stops are implemented.
Rooted in this is a deep-seated distrust of policing and a history of perceived oppression within high-crime minority communities (see . Depending on the nature of the interaction, individuals may feel that they are being stopped without proper cause and/or that their personal freedom is being unjustly restricted, leading to a reduction in attitudes favorable to the police (see Baćak & Apel, 2021;Harris & Jones, 2020;Tyler et al., 2014).
For instance, research has suggested that in New York City, Black individuals are over six times more likely to be stopped by police than White individuals, and that the rate of success during these stops (operationalized as the rate of drug/weapon seizures or arrests) is often less than 3% for seizures and 7% for arrests (see Geller & Fagan, 2010;Gelman et al., 2007;Jones-Brown et al., 2010).
Thus, the vast majority of police stops appear to be conducted against disadvantaged populations that are neither committing an arrestable offense, carrying weapons, or carrying contraband.
There is also evidence to suggest that pedestrian stops can have deleterious effects on individuals' mental and physical health. Stops are often perceived as traumatic, invasive, and stressful, linking them to worsening anxiety, trauma, depression, sleep behavior, and physical functioning (Baćak & Apel, 2020;Geller et al., 2014;Hirschtick et al., 2020;Testa et al., 2021). In addition, pedestrian stops may be conducted in a rough manner, leading to the use-offorce that results in physical injury to the individual stopped (Brunson & Weitzer, 2009;Levine & Small, 2008). If these experiences happen in large numbers, vicarious knowledge of such incidents may further impact community perceptions of the police (Miller & D'Souza, 2015). These deleterious effects may also extend to behavioral patterns. Labeling theorists suggest that the imposition of a criminal sanction leads to the internalization of a deviant identity, socialization with deviant peers, and even defiance toward conventional society (Lemert, 1951;Sherman, 1993;Paternoster & Iovanni, 1989). Under this framework, contact with the criminal justice system only serves to worsen future behavior (Schur, 1973), and thus aggressive police stops may elevate individual-level delinquent/criminal offending (see Lee et al., 2017;Wiley et al., 2013).
Concern regarding the negative latent effects of pedestrian stops is particularly salient among certain sub-populations of people. Adolescent youth are in a critical developmental period and may be particularly susceptible to stressful/traumatic events PETERSEN ET AL. | 5 of 42 and deviant labeling (Geller, 2017;. In addition, racial minorities are disproportionately exposed to proactive policing tactics such as pedestrian stops . Given a history of mistreatment and abuse at the hands of the police, these experiences may lead to elevated levels of stress and further compound pre-existing beliefs about racial stereotyping (see Baćak & Nowotny, 2020;Geller, 2017;Wheelock et al., 2019). Thus, while pedestrian stops have a clear theoretical linkage to area-level crime reduction benefits, they also have equally clear linkages to deleterious community and individuallevel outcomes.

| Why it is important to do the review
Proactive policing tactics play an important role in crime prevention (Skogan & Frydl, 2004;Telep & Weisburd, 2012;Weisburd & Eck, 2004;Weisburd & Majmundar, 2018). However, the effects of proactive interventions vary greatly by the type of intervention and the manner in which the intervention is applied. Some tactics raise critical questions about the impacts of policing on the communities that they serve and the individuals subject to the intervention Tyler et al., 2014).
Police have long felt that pedestrian stops can have an important general and specific deterrent value in preventing crime.
Research evidence supporting this view began to develop in the 1990s with evaluations of police crackdowns (Sherman, 1990).
There is evidence that many cities across the US were using pedestrian stops as a key crime prevention tool (Gelman et al., 2007;White & Fradella, 2016), and indeed the use of pedestrian stops has often correlated with decreasing crime in major US cities . But a rigorous assessment of the crime prevention outcomes associated with pedestrian stops has not been developed to date. A key contribution of our review is the attempt to identify whether pedestrian stops reduce crime, and if so to identify the size of that impact. Given controversies about the use of pedestrian stops as a crime prevention strategy, it is important to understand how much benefit (if any) it provides for public safety.  (Bradford, 2017;Lennon & Murray, 2018;Murray et al., 2021).
Due to these concerns, pedestrian stop tactics have become extremely controversial, and recent years have seen the use of such stops decrease substantially in major cities such as New York and Philadelphia (McNeil, 2020;Weisburd et al., 2016), as well as in European countries such as England and Scotland (Lennon & Murray, 2018;. There has even been a growing call among many to do away with pedestrian stop tactics entirely (see Baker & Goldstein, 2012). Yet, existing reviews have often failed to find evidence of negative impacts on community evaluations of the police-though negative effects on people who are stopped has a stronger evidence base (e.g., see Weisburd & Majmundar, 2018). Thus, it is increasingly important to determine if pedestrian stops, developed to reduce crime, produce negative consequences for the individuals and communities affected by them. To date, no review has systematically assessed these outcomes or simultaneously considered them alongside each other.
Such a review is critical for informed crime prevention policy that weighs all potential costs and benefits.

| OBJECTIVES
Given that pedestrian stop tactics have garnered controversy and concern over their potential effects on crime (see MacDonald et al., 2016;Weisburd et al., 2016), the community (see Baker & Goldstein, 2012;Gelman et al., 2007;Miller et al., 2000;Tyler et al., 2014) and the individuals subject to them (see Geller et al., 2014;Geller, 2017;McFarland et al., 2019;Wiley et al., 2013), the main objective of this review is to synthesize the impact of pedestrian stops across each of these areas. Specifically, this review seeks to assess the following questions: • What are the effects of pedestrian stop interventions on area-level crime and disorder?
• What are the effects of pedestrian stop interventions on individual and community-level attitudes toward the police?
• What are the effects of pedestrian stops on individual mental and physical health outcomes?
• What are the effects of pedestrian stops on self-reported crime and/or delinquency?
• What are the effects of pedestrian stops on violence in policecitizen encounters and officer misbehavior?
Our secondary objective, proposed at the time of protocol publication (Weisburd et al., 2021), was to examine whether the effects of police-initiated pedestrian stops vary according to the following moderating factors: research design, country, size of geographic area, crime type of focus, and racial composition. Based on the eligible studies identified, we were able to assesses the degree to which heterogeneity in effect sizes might be explained by research design (e.g., matched vs. unmatched designs) and characteristics of the sample (e.g., youth vs. non-youth samples, size of the geographic area targeted For studies to be considered eligible for this review the evaluation was required to include a treatment group that received a pedestrian stops intervention and a separate comparison group that did not receive a pedestrian stops intervention. Here, the treatment group could be comprised of either geographic areas or individuals, and eligible treatments could include proactive policing interventions, natural variation in the use of pedestrian stops across areas, or natural variation in the prevalence of police stops across individuals. In other words, we included comparisons of areas and individuals that differed naturally in their exposure to police stops, regardless of whether these differences were the result of any planned policing intervention. Eligible comparison conditions could include any group of areas or people that were not exposed to a pedestrian stops intervention or were exposed to a lower dosage of the intervention. For geographic studies, comparison conditions generally involved standard police practices, and for individual-level studies comparison conditions were generally comprised of individuals who had not directly experienced police stops. Studies were included regardless of their publication status. Both randomized and quasi-experimental research designs were considered eligible for inclusion (Campbell & Stanley, 1966;Cook & Campbell, 1979;Shadish et al., 2002). This inclusion threshold was adapted from the inclusion criterion in the Global Policing Database (GPD) protocol (Higginson et al., 2015, pp. 47-48), which was the primary search source for this review. From the GPD, we included the following types of designs: • Randomized controlled trials (RCTs) • Matched control group designs with or without pre-intervention baseline measures (propensity or statistically matched) • Unmatched control group designs with pre-intervention measures (difference-in-difference analysis) • Unmatched control group designs with pre-post intervention measures which allow for difference-in-difference analyses • Unmatched control group designs without pre-intervention measures where the control group has face validity • Raw unadjusted correlational designs where the variation in the level of the intervention is compared to the variation in the level of the outcome Thus, this review includes weaker quasi-experimental studies with "unmatched" control groups; for example, studies that compared a target area or group to the remainder of a jurisdiction or population.
Accordingly, any evaluation of pedestrian stops that included a comparison group or area that did not receive the intervention was considered eligible so long as it met our other inclusion criteria.
However, we distinguish between matched and unmatched designs in a subsequent moderator analysis (Section 5.4).

| Types of participants
Given our interest in examining the impacts of pedestrian stops on crime, the community, and the individuals subject to these stops, this review includes the following populations: • Law enforcement officers (including any particular race, ethnicity, gender) • Citizens (including citizens who are the subjects of pedestrian stops or live in areas subject to stop interventions; and including any race, ethnicity, gender) • Places (including micro places such as street segments, clusters of addresses, police beats; meso-places such as neighborhoods and communities; or macro-places such as entire jurisdictions).

| Types of interventions
Studies that evaluated interventions in which police-initiated pedestrian stops of individuals or groups of individuals (for the purpose of questioning, investigation, and/or frisking and searching) were carried out as a major component of a policing intervention were considered eligible for this review. As previously noted, the term "intervention" included natural variation in general policing approaches throughout a jurisdiction and/or natural variation in exposure to police stops among samples of individuals.
That is, any comparison of people or places with differential exposure to pedestrian stops was considered an intervention for the purposes of this review. It is important to note here that our focus was on pedestrian stops, and as such, we excluded studies that were solely or primarily focused on traffic stops. More specifically, our interest was in isolating interventions consistent with the concept of SQF, which is traditionally associated with pedestrian stops (see Jones-Brown et al., 2010;Lachman et al., 2012). However, we did include studies in which both pedestrian and traffic stops were used, given the oftenoverlapping nature of these forms of policing, and so long as pedestrian stops remained a major component of the intervention.
We did not attempt to distinguish between the individual motivations behind pedestrian stops or determine whether stops were used reactively or proactively (i.e., whether stops were in response to observed criminal behavior), but rather focused on the intent of the program in which pedestrian stops were a component.
This review was not limited to interventions targeting specific types of crime or disorder (e.g., weapon and drug-related crime), or any specific type of overarching policing tactic (e.g., hot spots policing, crackdowns, directed patrol, etc.). However, we did exclude studies employing pedestrian stops in a minor capacity relative to other policing tactics (as the effects of the stop component would be PETERSEN ET AL. | 7 of 42 difficult to isolate from the other components of the policing intervention).

| Types of outcome measures
This review included the following outcome measures. All outcomes were considered primary, and eligible studies were required to report at least one of these measures for inclusion: • Crime and disorder (including displacement) • Incidents of violence in police-citizen encounters • Mental health issues • Physical health issues Crime/disorder and displacement outcomes were considered eligible if measured using official data (e.g., incident and arrest data, calls for service, crime rates), unofficial crime data (e.g., crime reported by civilians, self-report delinquency via questionnaires or surveys), and systematic social observations of crime. All types of crime and/or disorder were included in this review (e.g., property, drug, violent crime).
We anticipated that incidents of violence in police-citizen encounters would be measured through police use-of-force reports (Weisburd et al., 2021). We planned to be as discrete as possible, including capturing use-of-force that results from suspect resistance and varying levels of force when possible. We also note that this outcome is not necessarily a measure of unjustified use-of force, and thus distinguished this outcome from officer misbehavior. We anticipated that officer misbehavior would be measured through formal citizen complaints or community surveys reporting on police abuse or violence.
We included studies where fear of crime and attitudes towards police were measured using questionnaires or surveys at the community-level or taken from individuals who directly experienced police stops, as well as those who did not.
For mental and physical health issues, we included studies that measured these outcomes via self-reports taken from individuals with direct police stop experience or via official data (e.g., injury data from hospitals), and we included data measured at both the individual-and community-levels of analysis. For the purposes of this review, mental health issues were defined as symptoms or diagnoses related to an established mental health condition or a "clinically significant behavioral or psychological syndrome or pattern that occurs in an individual" (Stein et al., 2010(Stein et al., , p. 1760, such as anxiety, post-traumatic stress disorder (PTSD), suicidality, depression, etc. Physical health issues concerned any characteristic or condition that could directly impact or have implications for physical functioning, such as self-reported physical health, sleep problems, and/or functional limitations (see e.g., Baćak & Apel, 2020;Testa et al., 2021).

| Duration of follow-up
Eligible studies were not restricted to any particular follow-up period.
At the geographic level, stop interventions are likely to produce short-term deterrent effects (Sherman, 1990;Weisburd et al., 2016), though the impacts on individuals directly experiencing stops may be long term (see Dennison & Finkeldey, 2021;Wiley et al., 2013). In the protocol for this review (Weisburd et al., 2021), we planned to synthesize studies by length of follow-up period (<6 months, 6-12 months, >1 year). However, this approach needed to be adapted due to the nature of included studies and is described in the results section.

| Types of settings
No restrictions were placed on geographic region, racial, ethnic, or demographic makeup, or written language. We used Google Translate to conduct title and abstract screening for any non-English language studies, as well as for the main text of any non-English language articles that required full-text review.

| Searching other resources
We used several additional strategies to supplement the approaches described above. First, we searched additional databases from Japan, Korea, the Middle East, and Europe by consulting subject guides through the Duke University Library. Specifically, we searched the following databases using keywords related to policing and pedestrian stops consistent with those described in our main search strategies: • CiNii Articles

• Middle Eastern and Central Asian Studies
• Historical Abstracts Second, and similar to recent reviews using the GPD (Hinkle et al., 2020;Lum et al., 2020;Mazerolle et al., 2020), we performed hand searches of published volumes of leading journals in criminology from 2019 to 2021 to identify any studies that had yet to be indexed in electronic databases. Third, we conducted forward citation searches using Google Scholar and reference harvesting of prior reviews on related topics Koper & Mayo-Wilson, 2012). Finally, after completing all searches, we e-mailed our list of eligible studies to the lead authors of these articles to identify any research that the above searches may have missed. 2

| Selection of studies
All search results were first screened on title and abstract content to determine potential relevance to pedestrian stops. As an initial step, two screeners (Petersen and Fay) reviewed the same subset of 25 titles/abstracts to establish inter-rater reliability. Afterwards, the remaining results were double screened by both authors. All abstracts were reviewed using Abstrackr, which is a free online tool designed for abstract screening in systematic reviews (Wallace et al., 2012).
We then retrieved a full-text copy of all results marked as potentially relevant during title/abstract review. These results were also double screened by both reviewers. Any discrepancies in eligibility determinations or studies identified as "on the fence" were discussed among the entire research team before reaching consensus.

| Data extraction and management
Eligible studies were double coded by authors KP and SF using the The research team met frequently to discuss coding items and any discrepancies in coding were discussed among all review authors before coming to a final coding decision. EpiData Software (https:// www.epidata.dk/index.htm) was used to digitize coding forms and facilitate data entry. 2 Some of our email attempts were returned as undeliverable, and thus not all authors were successfully contacted. PETERSEN ET AL. | 9 of 42 4.3.3 | Assessment of risk of bias in included studies Six items adapted from the Cochrane randomized and nonrandomized risk of bias tools (Sterne et al., 2016;Sterne et al., 2019) were used to assess the potential for bias across all studies included in our meta-analysis. 3 We merged and adapted these items to provide a uniform assessment of risk of bias across all included studies, and because we did not consider many of the baseline questions to be relevant to this body of research. Our modified items included: (A) Whether assignment to groups was random, (B) Whether there were baseline differences between groups that were unaccounted for by the analysis, (C) Whether an appropriate analysis was used to control for any potential confounding variables, (D) Whether there were any failures in the implementation of the intervention that were likely to affect the results, (E) Whether there was reason to expect bias in the data used to evaluate the intervention, and (F) Whether the researchers were able to establish proper temporal ordering between the treatment and the outcome.
Randomization was a dichotomous response (No/Yes), but all other questions were rated as either "No," "Probably no," "Probably yes," "Yes," or "No information." It is important to note here that these ratings, while assessed in duplicate, do involve an inherent element of subjectivity. Additionally, these ratings correspond only to our outcomes of interest and the analyses from which we were able to calculate an effect size. At times, these analyses are not the primary ones reported by study authors or the primary purposes of the article.
Nonequivalence between groups (item B) was coded "probably yes" or "yes" if there was evidence of important baseline differences between groups that were not controlled for statistically. Otherwise, this item was coded as "probably no." The appropriateness of the statistical analysis (item C) was coded as "probably yes" for quasi-experimental studies using multiple regression or ANCOVA models, and "yes" for quasi-experimental studies using strong statistical matching procedures (e.g., propensity score matching). Quasi-experimental studies that did not control for confounding factors were rated as "probably no" or "no" for this measure. For experimental studies, the appropriateness of the analysis was coded as "yes" so long as a statistical significance test was used that did not appear to violate any necessary distributional assumptions (e.g., normality, independence). Implementation failures and data missingness (items D and E) were coded as "no" if there was high program fidelity and no evidence of missing data. Similarly, this measure was coded as "probably no" if there was no evidence that implementation issues or data missingness favored one group over the other. Finally, the ability of researchers to establish temporal ordering (item F) was coded as "no" or "probably no" for cross-sectional studies (i.e., cross-sectional surveys), and "probably yes" for longitudinal studies. Only longitudinal studies that could definitively separate the intervention and the outcome in time were coded as "yes" on this measure.
At the study-level, place-based quasi-experiments reporting evidence of uncontrolled baseline differences between groups were rated as "high risk" of bias. Quasi-experimental studies that reported either no evidence of baseline differences between groups or that statistically controlled for baseline differences were rated as "some concerns". Only place-based studies using random assignment were rated as "low risk" of bias, so long as the authors did not report evidence of significant issues with the assignment process, analysis, or program implementation.
For person-based studies, any study coded as "No" or "Probably no" on our temporal ordering measure was rated as "high risk" of bias (i.e., cross-sectional studies). Studies coded as either "Yes" or "Probably yes" on our temporal ordering measure were rated as "some concerns," so long as these studies used analytic methods that controlled for possible confounding variables (i.e., longitudinal studies using multiple regression analyses). Only longitudinal studies using strong statistical matching techniques (e.g., propensity score matching) with clear separation of treatment and outcome measures across time were rated as "low risk" of bias for person-based studies.

| Measures of treatment effect
The protocol for this review outlines the anticipated approach for effect size calculations based on the expected nature of the outcome measurements (Weisburd et al., 2021). This section provides a precise outline of our effect size calculations based on the studies included in the review.
Measures of treatment effect varied considerably across outcome groupings. For eligible place-based studies, effect sizes were calculated using logged relative incident rate ratios (RIRR).
These studies predominately reported count data for treatment and control groups during pre-and post-intervention periods (or during post-intervention periods alone). Given that Cohen's d effect sizes are sensitive to the way in which counts are divided across time and space, Wilson (2022) suggests the use of the RIRR for place-based studies. The RIRR is a difference-in-difference effect size that can be expressed using the following equation:  However, given that overdispersion is common in count data (see MacDonald & Lattimore, 2010), an adjustment to the variance is often necessary. Wilson (2022) recommends the following correction for over-dispersion based on the quasi-Poisson model: where X̅ k is the average count for treatment and control areas across both pre-and post-intervention time periods, S k is the standard deviation for each average count, and n k is the number of counts (contributing to the mean) for both treatment and control groups across pre-and post-intervention periods. If the Ф value is greater than one, then the variance is multiplied by the Ф value to adjust for overdispersion. Unfortunately, the necessary data to correct for overdispersion was only available in a subset of our eligible place-based studies. To adjust the variance for the remaining effect sizes, we simply used the mean value of Ф across the studies that presented sufficient data to calculate it.
For most eligible studies of crime and displacement, we were able to calculate an RIRR using reported means or counts. Several studies, however, required alternate methods to obtain an effect size. Two studies reported regression coefficients from countbased models (MacDonald et al., 2016;McCandless et al., 2016), allowing us to use the logged incident rate ratio and standard error reported directly in the regression model. These regression coefficients also provided estimates that were adjusted for various confounding factors or forms of non-independence that were possible within the data. One study used a linear probability model to assess the mean difference in probability of a crime occurring for treatment areas/times compared to control areas/times (Weisburd et al., 2016). Here, we used the regression coefficient and the intercept of the regression model to construct a risk ratio.
Given that risk ratios can be considered censored counts (see Wilson, 2022), we synthesized this effect size with studies reporting count data. Finally, one study required the use of a digitizing software to obtain numeric data from a line graph comparing treatment and control areas (Murray, 2014). To accomplish this, we used Engauge Digitizer, which has been recommended and used in recent meta-analyses (see No et al., 2018;Tantry et al., 2021). 4 Mental and physical health outcomes were most frequently reported as dichotomous measures, often using some form of logistic regression. As such, we synthesized these studies using logged odds ratios (ORs). We note here that risk ratios may have been preferable given their ease of interpretation , but we did not often have the requisite data to convert reported ORs into risk ratios. In most cases, we coded ORs directly from logistic regression models and calculated the standard error of the logged OR using the reported 95% confidence interval (CI) (Dennison & Finkeldey, 2021;Hirschtick, 2017;Hirschtick et al., 2020;Jackson, Testa, Vaughn, & Semenza, 2020;Lewis & Wu, 2021;Sundaresh et al., 2020;Testa et al., 2021). 5 However, a subset of eligible studies reported mental health outcomes using continuous or ordinal measurements (Baćak & Apel, 2020;Geller, 2017;Geller et al., 2014;McFarland et al., 2019). For these studies we calculated Hedges' g effect sizes and converted them to logged ORs using the Cox logit method, which multiplies the standardized mean difference by 1.65 and divides the variance by 0.367 (see Sánchez-Meca et al., 2003;Wilson, 2017).
Individual attitudes toward the police and self-reported crime/ delinquency were generally operationalized as scaled or continuous measurements. We synthesized these studies using Hedges' g effect sizes, which represents the standardized mean difference between groups (Hedges, 1981 Where such a selection was not clearly possible, we prioritized the most valid effect size as determined by our risk of bias ratings. For example, a number of studies analyzed similar outcomes taken from the same longitudinal cohort surveys (see e.g., Geller, 2017;Slocum et al., 2016;Turney, 2021;Wiley et al., 2013;. In these situations, we selected the effect size determined by coders as being the best causal estimate, or the estimate that did the best job of establishing the elements of causality. In general, this criterion prioritized the selection of well-matched or adjusted estimates over unmatched or unadjusted estimates. At times, however, our selection of effect sizes was subjective or arbitrary. To ensure that these selections did not bias the results of our review, we conducted sensitivity analyses that incorporated all calculated effect sizes for each study/sample. These analyses were conducted using robust variance estimation (RVE), which is a method capable of analyzing statistically dependent data structures in meta-analysis (see Tanner- Smith et al., 2016). In the RVE model, the weight of each effect size is no longer directly related to its variance. Assuming a correlated data structure, the effect size weights in RVE models become the product of the average effect size within each grouping unit and the number of effect sizes nested within that grouping unit (see Tanner- Smith et al., 2016).
Thus, the weight of each effect size within a study or sample will display an inverse relationship with the number of effect sizes nested within that study or sample. Additionally, all effect sizes within a grouping unit will receive the same weight. This method avoids potential issues associated with the over-representation of a sample or study due to the inclusion of multiple effect sizes. For our analyses, we assumed a correlated data structure and clustered standard errors by each unique sample (for a similar approach see Wilson et al., 2021).

| Dealing with missing data
When studies that were otherwise eligible did not report the necessary data to calculate an effect size, we attempted to contact study authors. Ultimately, we were unable to calculate an effect size for only one eligible study that otherwise would have been included in a meta-analytic model (Alderden et al., 2011). We review the narrative results of this study and all other eligible studies not included in our meta-analysis in subsequent sections.

| Assessment of heterogeneity
We assessed heterogeneity in effect sizes estimates using the Q statistic, I 2 values, and τ 2 values. Here, the Q statistic represents the statistical significance of the between-study variance (i.e., whether there is more variance than would be expected from sampling error alone), the I 2 value represents the percentage of total variance attributable to variance between studies, and the τ 2 value represents the magnitude of the random-effects variance component (see Borenstein et al., 2010;Higgins & Thompson, 2002). Additionally, we explored between-study heterogeneity using various moderator analyses (see Section 4.3.10).

| Assessment of reporting biases
Three methods were used to assess the potential for reporting bias.
First, we conducted moderator analyses comparing the mean effect sizes for published and unpublished studies. Second, we generated funnel plots with trim-and-fill analyses to identify any asymmetries in effect size estimates across standard error values and to impute missing values if needed (Duval & Tweedie, 2000). Finally, we conducted Egger's regression tests to assess the linear relationship between standard error and effect size magnitude (Egger et al., 1997).

| Data synthesis
Data synthesis for this review involved standard inverse-variance weighted meta-analysis. A separate model was estimated for each unique outcome construct and all outcomes were analyzed using random effects models. The random effects variance component (τ 2 ) for each model was derived using restricted maximum likelihood estimation. These primary analyses were conducted in R statistical software using the metafor package (Viechtbauer, 2010). Sensitivity models incorporating all calculated effect sizes were estimated using the robu() function found in the robumeta package in R statistical software (Fisher & Tipton, 2015).

| Subgroup analysis and investigation of heterogeneity
Per the protocol for this review (Weisburd et al., 2021), we investigated heterogeneity across effect size estimates using a variety of additional moderator analyses. Due to the characteristics of our eligible studies and the data that was frequently reported, the moderators used for each outcome grouping differ from those listed in the protocol. For place-based studies these moderators included: • Research design ("matched" vs. "unmatched" designs) • Geographic size (micro place vs. neighborhood/police beat vs. district/precinct vs. entire city) • Geographic location (US vs. Europe) For studies assessing mental health outcomes, these moderators included: • Research design ("adjusted" vs. "unadjusted" estimates) • Sample demographics (youth sample vs. adult sample) • Geographic location (US vs. Europe) For studies assessing individual attitudes toward the police, these moderators included: • Research design ("adjusted" vs. "unadjusted" estimates) • Sample demographics (youth sample vs. adult sample) • Geographic location (US vs. Europe) We did not employ moderator analyses for physical health outcomes or self-reported crime/delinquency, given the small number of studies included in these models.

| Sensitivity analysis
In addition to the RVE models previously described (see Section 4.3.5), several sensitivity analyses were conducted. One study measuring attitudes toward the police produced a large effect size that was an apparent outlier in the forest plot for this outcome (Singer, 2013

| Deviations from protocol
In the protocol for this review (Weisburd et al., 2021), we indicated that we would explore differences in effect sizes by racial/ethnic composition and by crime type of focus (e.g., violent vs. drug crime). Unfortunately, too few studies for any specific outcome measure provided separate effect size estimates for racial or ethnic categories. More commonly presented was the demographic and ethnic composition of treatment and control groups in terms of group proportions or percentages. We considered using these data to construct a measure of relative racial difference for treatment groups compared to control groups for each study, and then employing this measure as a continuous independent variable in a metaregression. However, in nearly all cases, researchers controlled for the effect of race/ethnicity during their analyses. Thus, using racial composition as a moderator to explain effect sizes that are already adjusted for the effect of race and ethnicity may fail to find a significant relationship for artificial reasons. In addition, few studies within any given outcome grouping provided information on the racial composition of both treatment and control groups, and there was often little variability in these racial compositions, with treatment samples primarily represented by individuals that belonged to a minority group. Regarding the crime type of analysis, all eligible studies presented either a single measure of crime, an aggregate measure of violent crime, or an overall aggregate measure of crime. In other words, there was little consistent variation in terms of the types of crime analyzed (e.g., few studies measured property crime or disorder).
The initial inclusion criteria for this review suggested that eligible interventions must be targeted at a geographic area. However, we identified a considerable number of studies measuring the effect of pedestrian stops on individuals. These studies do not often focus on police intent or provide information suggestive of any specific geographic policing intervention, and thus we expanded our inclusion criteria from what was originally described to include these studies. Finally, there were several outcome measures mentioned in the initial protocol that we were unable to analyze due to a lack of eligible studies (violence in policecitizen encounters, officer misbehavior, fear of crime, etc.). 7 Studies measuring community attitudes toward the police (i.e., attitudes of individuals residing in targeted areas who were not directly subject to a police stop) were rare and there was considerable variation in the specific measures used across studies. As such, we were unable to consistently generate appropriate effect size estimates and chose instead to review these results narratively. However, results of meta-analytic models are presented for all other listed outcomes.
Our protocol also stated that risk of bias ratings would be determined using the Cochrane risk of bias tools (J. A. Sterne et al., 2016; J. A. C. Sterne et al., 2019). Although the items we used to assess risk of bias were adapted from these tools, we did not attempt to utilize them in their entirety or strictly follow the logic laid out by these tools. While this may present concern over replicability, deviations from this approach were necessary to tailor our items to the issues most relevant to this body of research. We detail the logic of our risk of bias ratings in Section 4.3.3. 6 Unfortunately, measures of the satisfaction or procedural justice associated with a police stop were not reported frequently or consistently enough within any given outcome grouping to permit a separate moderator or sub-group analysis. 7 Note, however, that one study (Boydstun, 1975) reported the number of citizen complaints both before and after a pedestrian stops intervention, but no complaints were filed in either time period. Of the 10 eligible studies measuring crime and disorder outcomes, nine were included in our meta-analysis (see Table 2). 9 This collection of studies contained a mixture of proactive policing interventions and retrospective evaluations of natural variation in the use of pedestrian stops. For example, five studies assessed the impact of interventions explicitly manipulating pedestrian stops, sometimes in the context of more general proactive policing interventions within specific areas (Boydstun, 1975;Cohen & Ludwig, 2003;McGarrell et al., 2002;Ratcliffe et al., 2011; (Boydstun, 1975). Both the San Diego field interrogation study and the Indianapolis directed patrol experiment also contained multiple intervention arms. In the San Diego study, a separate treatment area received specialized training intended to reduce friction with citizens during stops. In the Indianapolis study, the East target areas (rather than the North target areas) used a less selective approach to gun crime enforcement that was more focused on the broad application of traffic stops. In this review we do not include or discuss the impact of the specialized field training that occurred in San  Wellbeing survey (FFCWS) measured treatment by asking youth whether they had ever been stopped by police "while on the street, at school, in a car, or some other place" (Jackson, Testa, & Vaughn, 2020, p. 753), and youth in the National Evaluation of the Gang Resistance Education and Training program (GREAT) were asked how many times in the past 6 months they had been stopped by the police for questioning (though this variable was generally dichotomized). In the FFCWS, this measure was often taken during the year 15 wave (i.e., when respondents were roughly 15 years old), and the GREAT survey administered this item during the second/third waves of data collection, when the youth were generally 12 years of age or older (see  they were asked if they had ever been "unfairly stopped, searched, or questioned by police" (Dennison & Finkeldey, 2021, p. 263 Ten individual-level studies measured mental health outcomes.
These outcomes commonly included anxiety (Geller, 2017;Geller et al., 2014), depression (Baćak & Nowotny, 2020;Hirschtick et al., 2020;Turney, 2021), suicidality (Dennison & Finkeldey, 2021; and PTSD symptoms (Geller, 2017;Hirschtick et al., 2020;Lewis & Wu, 2021). All such studies used self-reported questionnaire or interview surveys, often incorporating items from validated medical instruments. Similar procedures were used across the five studies measuring physical health outcomes, which included self-reported poor health (Baćak & Apel, 2020;McFarland et al., 2019) and sleep problems (Jackson, Testa, Vaughn, & Semenza, 2020;Testa et al., 2021). Attitudes toward the police and self-reported crime/delinquency were also measured using self-report surveys and interviews. Common outcomes for attitudes toward the police included scaled or ordinal measures of police legitimacy (Baćak & Apel, 2021;Murray et al., 2021;Tyler et al., 2014), respect (Friedman et al., 2004;Harris & Jones, 2020;Singer, 2013), trust (Friedman et al., 2004;Murray et al., 2021;Singer, 2013), satisfaction (Wheelock et al., 2019), and overall negative attitudes (Rosenbaum et al., 2005;Swaner & Brisman, 2014 (Geller, 2017). However, at other times these outcomes were conceptually similar. For example, both Geller (2017)  Finally, four studies met our inclusion criteria but were too conceptually dissimilar from the studies described above to include in our meta-analysis. Two studies used self-report surveys from the FFCWS to measure respondents' degree of legal cynicism (Hofer et al., 2020;. Here, legal cynicism involved attitudes toward multiple aspects of the legal and criminal justice systems, rather than toward the police alone. Thus, while we considered legal cynicism to be an important outcome, we did not synthesize it with studies measuring attitudes toward the police. One study measured community members perceived sense of safety, comparing individuals who had been stopped by police in the past 6 months to those who had not (Kochel & Nouri, 2021  Finally, several studies published after our 2021 deadline that would have otherwise met our eligibility criteria were recommended by subject matter experts. While we excluded these studies from our meta-analysis and main results, we discuss the general findings of these studies and their implications for the results of our review in Section 5.5.

| Risk of bias in included studies
Our risk of bias ratings for geographic crime and disorder studies can be seen in Table 3. Overall, we considered these studies to be at moderate risk of bias toward treatment. All studies evidenced temporal ordering.
Only one study used random assignment (Ratcliffe et al., 2011), and only three others selected control areas based on their comparability to treatment areas (Boydstun, 1975;McGarrell et al., 2002;. The remaining studies compared treatment areas to the remainder of a jurisdiction or sample not receiving treatment. Of these studies, Weisburd et al. (2016) used a strong instrumental variable approach to account for treatment endogeneity and to reduce potential bias. Often, treatment areas in non-experimental studies were selected based on high baseline crime rates, increasing the risk of bias toward treatment. However, researchers generally controlled for these baseline differences using multiple regression and/or difference-in-difference analyses (see Cohen & Ludwig, 2003;MacDonald et al., 2016;McCandless et al., 2016). Only one study required the calculation of an effect size using unadjusted and unmatched data (Murray, 2014). This study produced an effect size that was largely null, however, and did not appear to be biased toward treatment. Several studies encountered minor issues with data collection or program implementation, such as the redrawing of area boundaries after the start of the intervention (Boydstun, 1975), alternative interventions taking places during the study evaluation period (McGarrell et al., 2002), minor treatment contamination (Ratcliffe et al., 2011), or the suspension of funding during the study period , but there was no evidence of any major issues with implementation or data accuracy that were likely to impact study findings.
Risk of bias ratings for person-based studies can be seen in Table 4.
Here, we do not include an item about implementation failures as there was generally little to no information about the intervention itself (i.e., the police stop). For outcomes involving attitudes toward the police, mental health, and physical health, we consider these studies to be at high risk of bias overall. None of the person-based studies used random allocation.
Many of these studies also identified significant baseline differences in the demographic composition of treatment and control groups, while several other studies did not provide descriptive information to compare the two groups. For example, many studies found that Black and male respondents were more likely to be stopped by police than White and female respondents (see e.g., Dennison & Finkeldey, 2021;Friedman et al., 2004;Geller, 2017;Singer, 2013;Wheelock et al., 2019). Given this, our ratings concerning the appropriateness of the statistical analysis were primarily concerned with the inclusion of these characteristics as covariates. Most person-based studies analyzed outcomes using various forms of multiple regression that included control variables related to demographic, economic, and/or behavioral differences between groups (see Dennison & Finkeldey, 2021;Geller et al., 2014;Harris & Jones, 2020). However, a subset of studies measuring attitudes toward the police used unadjusted bivariate analyses, presenting considerably higher risk of bias toward control groups (see Friedman et al., 2004;Singer, 2013). There was also concern regarding attrition and/or nonresponse bias across all person-based studies. However, there was generally no information presented to suggest that attrition or nonresponse differed between individuals who were stopped by police and those who were not.
The most pressing issue facing our collection of person-based studies involved temporal ordering. Considering that the majority of studies analyzed cross-sectional data or longitudinal data in which the independent and dependent variables were measured during the same wave of data collection, there was often no clear way to establish the order of these variables across time. While matching 10 As an example, chapter 7 in Bradford (2017)  subjects on factors that may make them more or less likely to be stopped (or simply controlling for these factors via regression models) helps to reduce this concern, there remains a potential for reverse causality. That is, the presence of mental health issues or negative attitudes toward the police may lead to increased police stops, rather than vice versa. Only one study measuring attitudes toward the police incorporated both a pre-and post-stop outcome measure (Rosenbaum et al., 2005

| Effects of the intervention
In total, we analyzed 58 effect sizes across six outcome groupings (including sensitivity analyses), representing 90,904 people and 20,876 places. The summary effect sizes for each outcome can be seen in Table 5  , and values less than 1 indicate a decrease in incidence or odds for treatment groups relative to control groups. As shown in  As seen in Figure 2, pedestrian stops interventions were associated with a statistically significant reduction in crime of 13% (p < 0.001) for treatment areas relative to control areas. CIs for this outcome suggest a crime reduction effect ranging from 9% to 16%.

| Crime and displacement
There is also a notable lack of heterogeneity in these effect sizes.
All effect sizes tend to favor treatment with overlapping CIs, and between-study heterogeneity was not statistically significant, as indicated by the Q statistic. with CIs ranging from a 4% decrease in crime to a 9% decrease in crime. There was also a lack of significant or excess heterogeneity The eight effect sizes shown in Figure 4 suggest that individuals stopped by police were associated with a statistically significant 46% (p < 0.001) increase in the odds of a mental health issue, with CIs F I G U R E 2 Crime effects for place-based studies F I G U R E 3 Displacement effects for place-based studies ranging from a 24% increase to a 72% increase. All effect sizes favored control groups, though there was significant heterogeneity in effect sizes estimates, as roughly 78% of the total variance could be attributed to between-study variance.
As seen in Figure 5, the four studies measuring physical health outcomes provided similar results. Overall, there was a statistically significant 36% (p < 0.001) increase in the odds of a physical health issue for treatment groups relative to control groups, and the CI for this outcome suggests that likely effects range from a 14-62% increase. All four studies showed significant effects favoring control, though there remains statistically significant between-study heterogeneity. Despite the strong and significant backfire effects indicated by these mental and physical health analyses, it is important to reiterate the inherent difficulties and potential biases involved in measuring these outcomes. Causal interpretations should be made cautiously.

| Attitudes toward the police
Individuals subjected to pedestrian stops, particularly those that are perceived as false or unfair, may harbor resentment and negative future attitudes toward the police. Our nine eligible studies measuring attitudes toward the police are displayed in Figure 6.
Hedges' g effect sizes were used for these outcomes given their often Results from Figure 6 indicate that pedestrians stops were associated with a statistically significant small to moderate decrease in attitudes favorable to the police (g = −0.38, 95% CI: −0.59, −0.17, p < 0.001). The classification of this effect size as small to moderate is based on the conventions suggested by Cohen (1992), however, outside of laboratory settings this effect may be considered rather large (Lipsey et al., 2012). Using the binomial effect size display to convert this effect into a percentage point difference suggests an 18.6% differential between control and treatment groups. 11 Eight of nine effect sizes for this outcome favored control groups, however, there is also a very large degree of between study variance. Over 97% of the total heterogeneity in this model can be attributed to heterogeneity between studies, and one study (Singer, 2013) displayed an unusually large effect size (which we return to in our sensitivity analyses). Once again, while this evidence implies a strong backfire effect of pedestrian stops on attitudes toward the police, the risk of bias toward control groups across these studies is generally high. Additionally, this level of heterogeneity suggests a large degree of uncertainty as to the true mean effect size.

| Self-reported crime/delinquency
If pedestrian stops result in the imposition of a formal label that leads to the exclusion of individuals from conventional bonds and activities, then we may also expect to see a backfire effect in terms of specific deterrence. Results from the four eligible studies comparing selfreported crime/delinquency for individuals stopped by police to individuals not stopped by police are shown in Figure 7. Here, effects to the right of the no reference line indicate increases in self-reported crime/delinquency for treatment groups relative to control groups, and thus are defined as effects favorable to control. The combined sample size for this outcome was 11,402.
Results from this analysis continue to suggest deleterious individuallevel effects of pedestrian stops. Specifically, there was a statistically significant increase in self-reported crime/delinquency for treatment groups relative to control groups (g = 0.30, 95% CI: 0.12, 0.48, p < 0.001).
Using the binomial effect size display to convert this effect into a percentage point difference suggests an approximate 15% differential between control and treatment groups. All four effect sizes reported here favored control, though there remains a statistically significant amount of between-study heterogeneity.

| Violence in police-citizen encounters
We did not locate any eligible studies providing measures of violence in police-citizen encounters.

| Officer misbehavior
Only one eligible study provided a potential measure of officer misbehavior. The San Diego field interrogation experiment measured citizen complaints against the police both before and after the intervention, however, there were no complaints during either time period (Boydstun, 1975).

| Sensitivity analyses
We conducted several robustness checks to assess the sensitivity of our results to different specifications. As previously noted, our main models included a selection of one effect size per study/sample. At times, this selection of effect sizes could be considered arbitrary, which presents concern over the potential for these selections to bias our results. Thus, we conducted sensitivity analyses using RVE that incorporated all calculated effect sizes taken from each sample and outcome grouping. These models were only estimated for mental health outcomes, attitudes toward the police, and self-reported crime/delinquency as these were the only outcomes for which difficult effect size selections were often required. For each model, standard errors were clustered by sample, resulting in eight unique clusters for mental health outcomes, nine clusters for attitudes toward the police, and four clusters for self-reported crime/ delinquency.
Results from our RVE models are displayed in Table 6. For mental health issues and attitudes toward the police, RVE models continued to suggest a statistically significant effect favorable to control groups.
The mean effect size for mental health studies decreased slightly (from a 46% increase in our main specification to a 37% increase in the RVE model), while the mean effect size for attitudes toward the police increased slightly (from g = −0.38 in our main specification to g = −0.40 in the RVE model). Results for self-reported crime/ delinquency remained similar in magnitude (from g = 0.30 in our main specification to g = 0.26 in the RVE model), but these results were no longer significant at a 0.05 threshold. However, the degrees of freedom for this model were fewer than four, which is considered an unreliable sample size for RVE estimation (see Tanner- Smith et al., 2016).
Our main model specification for attitudes toward the police also suggested the presence of an outlier (Singer, 2013

| Subgroup analyses
The examination of effect size moderators provides important context to the interpretation of meta-analytic findings (see Johnson et al., 2015). As such, we explore several factors that may moderator treatment effects across each of our outcome groupings. While many systematic reviews of crime and justice interventions compare effect sizes for experimental and quasi-experimental studies (e.g., Hinkle et al., 2020), we lacked a sufficient number of randomized experiments to conduct such an analysis. Thus, to assess the effect of risk of bias on study findings, we compare effect sizes for "matched" and "unmatched" designs (for crime/disorder and mental health outcomes) and for "adjusted" and "unadjusted" designs (for attitudes toward the police). Other moderators include the geographic size of the targeted areas (for crime/disorder outcomes), youth versus adult samples (for mental health outcomes and attitudes toward the police), and the geographic location of the study (for all outcomes). Moderator analyses are not conducted for spatial displacement, physical health, or self-reported crime/delinquency given the small number of studies included in these models.
Categorical moderator analyses were conducted using the analog to the ANOVA method (Lipsey & Wilson) and continuous moderator analyses were conducted using meta-regression (Higgins et al., 2020).

| Research design
Studies with weaker methodological rigor have been shown to produce larger effect size estimates than those with stronger methodological rigor (Weisburd et al., 2001). To test the potential for methodological strength to impact our crime/disorder and mental health findings, we compared effect sizes for studies with matched versus unmatched designs. Here, "matched" does not necessarily indicate a statistical matching procedure, but rather any attempt to identify comparable control areas. 13 Results of these moderator analyses can be seen inTable 7. For crime and disorder outcomes, unmatched designs were associated with a 10% decrease in crime for treatment areas relative to control areas, while matched designs were associated with a 19% decrease. This difference was non-significant and both effect sizes remained statistically significant individually (as indicated by the 95% CIs). Of note, if we consider Weisburd et al. (2016) to be an unmatched design, the difference between matched and unmatched effect sizes increases in magnitude and becomes statistically significant. However, we find this distinction to be misleading as Weisburd et al. used an instrumental variable approach that is likely stronger than any of the non-statistical matching procedures used in our other studies. For mental health outcomes, unmatched studies were associated with a 49% increase in the odds of a mental health issue for treatment groups relative to control groups, while matched designs were associated with a 43% increase. Once again, this difference was non-significant and both effect sizes remained statistically significant individually, with 95% CIs greater than one.
No eligible studies for attitudes toward the police employed matching procedures. However, several studies provided only unadjusted bivariate data from which an effect size could be calculated (Friedman et al., 2004;Singer, 2013;Tyler et al., 2014).
Thus, to assess risk of bias for these studies we compared effect sizes for adjusted and unadjusted estimates. Results from this analysis can be seen in Table 8. While adjusted effect sizes were notably smaller than unadjusted effect sizes, by an average of g = 0.26 (95% CI [−0.17, 0.68]), this difference was not statistically significant and both categories of studies remained significantly different from 0. Weisburd et al. (2014) suggest that the use of pedestrian stops is often targeted at high crime microgeographic areas. If so, then the mere increase in police presence within hot spots of crime and disorder may be responsible for any observed crime reduction effect (see Braga Note: Q model tests whether a significant amount of heterogeneity is explained by the moderator.

| Size of geographic area
Abbreviations: CI, confidence interval; OR, odds ratio; RIRR, relative incident rate ratios. 12 Here we mean that we were not able to derive a more general effect estimate from these studies, and that the effect sizes used in our analysis represent the effect of being both stopped and stopped in an unfair, false, or dissatisfied manner. 13 We also include our sole block randomized experiment ( To test for this potential, we conducted a moderator analysis comparing effect sizes for studies targeting micro-geographic areas, neighborhoods/police beats, police districts/precincts, and macrogeographic areas (e.g., entire cities). Given the small number of studies within each of these categories, we treat geographic size as a continuous variable and estimate this moderator analysis as a meta-regression.
Results of this analysis are shown in Table 9. On average, increases in the size of the geographic area targeted led to decreases of between 3% and 4% in effect size estimates (i.e., larger areas received smaller crime reduction benefits), however, this linear effect was not statistically significant (RIRR = 1.04, 95% CI [0.977, 1.105]). Of note, the mean effect sizes for all groups other than macro-geographic areas displayed CIs less than one, indicating statistical significance. However, we urge caution when interpreting these effects, given the small number of studies in each grouping.

| Youth versus adult samples
Concern regarding the deleterious impact of pedestrian stops is particularly relevant for adolescents, as these populations may be increasingly vulnerable to stressful/traumatic experiences and the imposition of formal labels (Geller, 2017;. For mental health outcomes and attitudes toward the police, there was sufficient variation in the samples used to compare the effects of pedestrian stops for youth and adults. The results of this analysis can be seen in Table 10.
For mental health outcomes, youth samples were associated with a 74% increase in the odds of a mental health issue for treatment groups relative to control groups, while adult samples were associated with only a 32% increase. This difference was nearly statistically significant at the 0.05 level (

| Geographic location
Per the protocol for this review, we also examined the difference in mean effect sizes by geographic location. Given that several studies Additionally, all moderator analyses were limited by a small number of studies.

| Studies not included in meta-analyses
While the primary objective of this review was to examine the impact of pedestrian stops on crime, the community, and the individuals subjected to stops, several relevant studies and outcomes could not be included in our meta-analysis. Since the number of these studies was small, we opted to review their results narratively. Our findings overall are consistent with those of the studies meta-analyzed. That is, pedestrian stops appear to negatively affect individual-level attitudes toward the police and the legal system while simultaneously producing a general deterrent effect on crime and disorder. However, place-based studies incorporating community surveys provide additional insight to suggest that the deleterious effects of pedestrian stops may be limited to those directly subject to the intervention, rather than the community more broadly. Two eligible studies using the same longitudinal survey sample included an outcome of legal cynicism (Hofer et al., 2020;. Given that this outcome was operationalized as a composite measure representing attitudes toward the legal system more broadly, we considered it too conceptually distinct to synthesize with studies measuring attitudes toward the police. Both Hofer et al. (2020) and Jackson, Testa, and Vaughn (2020)  Two studies, conducted by Kochel and Nouri (2021) and Lerman and Weaver (2014) Weaver used nonemergency 311 calls as a proxy for engagement, comparing precincts above and below the mean stops per capita.
Ultimately, Lerman and Weaver found that "high stop" precincts were associated with significantly more 311 requests, though this finding was attenuated by the proportion of stops that resulted in force.

| Community surveys from place-based studies
Four place-based studies included community surveys to assess the impact of police activity on community members as a secondary outcome measure.   findings from the Kansas City gun experiment. Target area residents surveyed both before and after the intervention reported being more satisfied with their neighborhood, less fearful of crime, and perceived lower rates of disorder and drug crime compared to residents from the control area ; see also Shaw, 1995).
Finally, Boydstun (1975)  (2022) and Testa et al. (2022)  No moderator analysis based on publication status was conducted for attitudes toward the police as no eligible studies were unpublished.  In sum, there is limited evidence of publication bias in our results.
Any potential bias appears to be minor and not substantively meaningful for our overall results.

| Summary of main results
The results of this systematic review and meta-analysis point to both intended and unintended effects of pedestrian stop interventions.
Analyzing 58 effect sizes across six discrete outcome groupings, we find that pedestrian stops lead to a reduction in crime at the geographic level but produce deleterious effects on the health, behavior, and attitudes of the individuals stopped by police. Taken together, our results suggest that pedestrian stops can be an effective crime control strategy, but one that comes with considerable drawbacks. Given the observed backfire effects in terms of individual health, attitudes, and behavior, it is not clear whether these interventions lead to any long-term net gain or produce benefits that justify their non-monetary costs. Our results also raise questions as to the mechanisms through which police stops may reduce crime. One common belief is that pedestrian stops produce a specific deterrent effect, or that individuals subject to a stop will alter their behavioral patterns to avoid future police interaction (see Stafford & Warr, 1993).
However, our finding of backfire effects on self-reported crime/ delinquency, coupled with area-level decreases in crime, suggest that any deterrent effect associated with pedestrian stops may be more general in nature. Given that police stop interventions often involve increased police presence in high-crime areas, these findings may also highlight the potential confounding effect of police stops with police presence toward the production of general deterrence. Despite this potential, we urge caution in the interpretation of our findings, particularly as they relate to person-based studies. There is both a significant amount of heterogeneity in effect size estimates for many outcome measures, and considerable risk of bias toward control groups. Given the issues associated with establishing proper temporal ordering between pedestrian stops and person-based outcomes and the difficulty involved with statistically controlling for an individual's likelihood of being stopped by the police, there remains a possibility of reverse causality. There was also an overall lack of random assignment in person-based studies and only one experimental evaluation assessing place-based crime outcomes, which greatly limits the potential to make strong causal inferences. Nonetheless, while there is a need for further research on the effects of pedestrian stops, the direction of effects across all outcome groupings is highly consistent.
6.2 | Overall completeness and applicability of evidence

| Quality of the evidence
The overall quality of the evidence included in this review is low by conventional standards (see Weisburd et al., 2001) and the risk of bias toward control groups was deemed to be high for most outcome groupings. Only one eligible study used random allocation and the majority of remaining studies relied on multiple regression analyses to reduce the potential for selection bias. However, this approach is reliant on the ability to identify, observe, and measure all potentially confounding factors, and given this difficulty, the potential for omitted variable bias is an ever-present concern (see Bushway & Apel, 2010;. For place-based studies of crime and disorder, roughly half of all included studies identified control areas based on considerations of comparability to treatment areas.
Similarly, half of our included studies on self-reported crime/ delinquency employed propensity matching techniques to equate treatment and control individuals on their likelihood of being stopped by police. These groups of studies were also able to establish appropriate temporal ordering, either through the inclusion of preand post-intervention measures or by separating measurements into discrete waves of data collection. Thus, for individual and placebased studies of crime and delinquency, we considered the quality of evidence to be moderately high and risk of bias was not a significant concern.
However, a major quality concern for studies measuring health outcomes and attitudes toward the police is the lack of clear temporal ordering. Outcome variables in these studies (e.g., depression, poor health, police legitimacy) are generally measured during the same wave of data collection as personal experience with police stops. As such, it is often difficult to determine when health issues or negative attitudes toward the police developed and whether an individual's experience with pedestrian stops preceded the development of these conditions. Given that negative health conditions and attitudes toward the police may increase the likelihood that individuals come into contact with police in general (Thompson & Kahn, 2016), there is clear risk of bias toward control groups for these outcome measures. While stronger research designs controlling for baseline levels of mental or physical health and/or the inclusion of propensity score weighting (see Dennison & Finkeldey, 2021;Geller, 2017) report results that are highly consistent with those of our overall findings, there is considerable potential for the quality of existing evidence to impact the findings of this review.

| Limitations and potential biases in the review process
We conducted a number of rigorous search strategies to capture a broad range of published and unpublished research. While there were no specific limitations in our review process, we encountered some issues that limited our ability to calculate effect sizes and assess certain outcomes that were specified in our initial protocol. First, we were unable to calculate an effect size for one eligible place-based study measuring crime and disorder. Additionally, we were unable to meta-analyze outcomes related to community surveys, given a lack of clear conceptual overlap in these outcomes and in the forms of data reported. Second, we did not identify eligible studies providing dedicated assessments of violence in police citizen encounters or officer misbehavior, and thus we are unable to speak to the effect of pedestrian stop interventions on these outcomes. Finally, we did not explicitly incorporate our risk of bias ratings into our meta-analysis.
However, these ratings largely overlapped with the methodological characteristics that we used during our moderator analyses.
6.5 | Agreements and disagreements with other studies or reviews The deterrent effect of pedestrian stops within targeted patrol efforts is also consistent with extant reviews of "hot spot" policing interventions (Braga et al., 2019, p. 1), though this finding brings into question the mechanism of effect in these interventions. That is, pedestrian stops may lead to a reduction in crime because they involve a targeted increase in police visibility within high crime areas rather than any deterrent effect produced by the stops themselves (see Weisburd et al., 2014). Unfortunately, we are unable to distinguish between these causal mechanisms in this review. A personal attitudes, health, and behavior. However, our review extends these findings by providing a systematic search of studies and applying meta-analytic techniques.
7 | AUTHORS' CONCLUSIONS 7.1 | Implications for practice and policy The findings from this systematic review and meta-analysis paint a complicated picture for practitioners and policymakers. On one hand, our results tend to support the long-held belief among law enforcement agencies that pedestrian stops constitute an important crime prevention tool (see Baker & Goldstein, 2012). Particularly when targeted at specific high-crime areas, pedestrian stop interventions are associated with significant and meaningful reductions in crime. In contrast, however, our results also support perspectives that are critical of pedestrian stops as a crime prevention tactic (see Fagan & Davies, 2000;Gelman et al., 2007). We find strong and significant evidence to suggest that being stopped by police is associated with worsening mental and physical health, attitudes toward the police, and even elevated levels of personal offending and delinquent behavior. Furthermore, we find preliminary evidence to suggest that the deleterious effects of pedestrian stops on mental health outcomes are particularly pronounced for youth, who are simultaneously more vulnerable to these encounters and at an increased risk of experiencing them (Geller, 2017). While the current review did not include measures of racial disparity, it is also wellestablished that minority populations are more likely to experience these forms of police contact Fagan & Davies, 2000;MacDonald & Braga, 2019;Ridgeway, 2007). Thus, the negative individual-level impacts of pedestrian stops may be disproportionately concentrated within minority and/or disadvantaged populations, perhaps furthering pre-existing socioeconomic disadvantage and deepening the divide between police and community members. Given these concerns, the central question for police agencies and policymakers is whether the positive effects produced by pedestrian stop interventions outweigh the negative effects, and whether agencies should use pedestrian stops, regardless of whether the intervention is effective.
In this regard, it is important to consider the findings of this review alongside those examining other proactive policing interventions. Recent reviews on hot spots policing and problem-oriented policing (POP) have reported crime reduction effects that are larger in magnitude than those reported here, without similar backfire effects on individual and community outcomes (see Hinkle et al., 2020). For example, Braga and Weisburd (2020) found that hot spots policing interventions were associated with a 16% reduction in crime, and Hinkle et al. (2020) found that POP interventions were associated with a 33.8% reduction in crime, for treatment areas relative to control areas. These tactics are also characterized by a larger body of research with considerably stronger methodological rigor than those included in this review. Thus, law enforcement agencies seeking to employ proactive policing tactics to reduce crime and disorder should consider interventions involving increased police visibility alongside community engagement and problem-solving efforts (see . These tactics holds promise in maximizing crime prevention while simultaneously increasing communication and cooperation with community members.
From a policy perspective, there is also still uncertainty as to the mechanism through which pedestrian stops reduce crime and disorder.
As the NAS panel on proactive policing noted, pedestrian stops are often confounded with the presence of directed patrol at high-crime areas, and it is possible that hot spots policing is responsible for some if not most of the observed crime reductions. While several existing studies find evidence to suggest a deterrent effect of stops themselves (MacDonald et al., 2016;McGarrell et al., 2000;, others find evidence to suggest that the primary deterrent mechanism may be increased police presence (Braakman, 2022). For example, both McGarrell et al. (2000) observed significant reductions in violent and gun-related crime following an increase in police stops but did not observe similar reductions in other types of crime that would still be subject to a general deterrent effect of police presence. These results led Sherman and Rogan to "refute the hypothesis of general deterrence due to more visible patrol presence" (p. 688). MacDonald et al. (2016) found that the crime reduction effect of pedestrian stops in New York City was limited to probable cause stops, rather than stops conducted based on more general suspicion. This suggests that stops may have a unique crime reduction effect, but that the overuse of stops is unlikely to lead to a greater reduction in crime. More recently, Braakman (2022) concluded that the deterrent effect of pedestrian stops was likely due to an increase in police presence, finding a significant reduction in anti-social behavior associated with pedestrian stops but no similar impact on violent crime. Thus, more research is needed on these mechanisms as it is unclear whether pedestrian stops produce a deterrent effect independent of police presence alone.
Law enforcement agencies should also consider the nature of the contact between police officers and citizens during pedestrian stops.
While too few studies in our review provided comparisons between control conditions and police stops of varying intrusiveness/satisfaction levels, there is evidence to suggest that the quality of police contact may be as important as the contact itself (see Harris & Jones, 2020;Mazerolle et al., 2013;Tyler et al., 2014). Indeed, several of our eligible studies find that the intrusiveness associated with a police stop (Harris & Jones, 2020), satisfaction with police contact (Baćak & Apel, 2021;Slocum et al., 2016), and perceptions of respect and procedural justice (Friedman et al., 2004;Slocum et al., 2016) may mediate the effect of these stops on individual-level outcomes. If so, it is possible that police agencies can mitigate the negative effects of pedestrian stop interventions through a focus on procedural justice during police-citizen encounters, though we are not presently able to make such a conclusion. Support for this possibility comes from a recent three city randomized trial which provided intensive procedural justice training to officers assigned to a procedural justice hot spots condition (as contrasted with non-trained officers in the standard hot spots condition). That study found positive impacts on resident views of police violence and harassment .
In sum, there are still important and understudied aspects of pedestrian stop interventions. However, current evidence indicates that the use of high-volume pedestrian stops leads to both meaningful reductions in crime and a broad range of negative effects for the individuals subject to these stops.

| Implications for research
There is a clear need for additional research on pedestrian stop interventions, particularly using experimental or strong quasiexperimental methods. Future studies separating personal experience with pedestrian stops, attitudes toward the police, and mental/physical health issues into separate waves of data collection (and/or or employing pre-and post-intervention outcome measurements) would go a long way toward establishing temporal ordering and strengthening any causal inferences related to personal attitudes and health outcomes. Additional use of propensity score matching techniques, specifically for studies examining attitudes toward the police, is also needed to limit the potential for selection bias. This is exceedingly true considering the lack of random allocation used in these studies and the feasibility issues that are likely involved in the experimental analysis of pedestrian stops at the individuallevel. Furthermore, there is an apparent lack of high-quality research examining the effect of pedestrian stop interventions on violence and misbehavior in police-citizen interactions. If high-volume pedestrian stops lead to additional use-of-force incidents or citizen complaints, then the negative impacts of these interventions may be even broader than those presented in this review. In this regard, future efforts may benefit from including a synthesis of qualitative research that explores individuals' experiences and perceptions of police stops. Along with this, existing research has largely been limited to contexts within the United States and the United Kingdom. Given evidence that similar strategies are being used in other parts of the world (Miller et al., 2008), future research is needed in these settings. Additional research with youth samples is also needed, as our ability to assess the unique effects of police stops on this demographic was limited. Finally, additional studies separating the effect of pedestrian stops by racial/ethnic groupings and levels of satisfaction/ procedural justice associated with the police stop itself are needed.
Although there were too few studies of this nature in the current review to provide dedicated analyses, extant research and theory clearly indicate that race/ethnicity and the nature of police contact may be important moderating factors.

ROLES AND RESPONSIBILITIES
• Content: Petersen, Weisburd, Fay manage these potential conflicts of interest, Mazerolle will not be involved in the editorial or formal approval process for this protocol or the subsequent review, nor will she independently decide on study eligibility, code studies, or conduct statistical or risk of bias analyses.